AITopics | binaural signal

Collaborating Authors

binaural signal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers

Mittal, Manan, Deppisch, Thomas, Forrer, Joseph, Sueur, Chris Le, Ben-Hur, Zamir, Alon, David Lou, Wong, Daniel D. E.

arXiv.org Machine LearningSep-26-2025

We propose a novel mixture of experts framework for field-of-view enhancement in binaural signal matching. Our approach enables dynamic spatial audio rendering that adapts to continuous talker motion, allowing users to emphasize or suppress sounds from selected directions while preserving natural binaural cues. Unlike traditional methods that rely on explicit direction-of-arrival estimation or operate in the Ambisonics domain, our signal-dependent framework combines multiple binaural filters in an online manner using implicit localization. This allows for real-time tracking and enhancement of moving sound sources, supporting applications such as speech focus, noise reduction, and world-locked audio in augmented and virtual reality. The method is agnostic to array geometry offering a flexible solution for spatial audio capture and personalized playback in next-generation consumer audio devices.

binaural signal, enhancement, microphone array, (12 more...)

arXiv.org Machine Learning

2509.13548

Country: North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.46)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.34)

Add feedback

Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals

Chi, Cheng, Li, Xiaoyu, Li, Andong, Ke, Yuxuan, Li, Xiaodong, Zheng, Chengshi

arXiv.org Artificial IntelligenceOct-8-2024

Telepresence technology aims to provide an immersive virtual presence for remote conference applications, and it is extremely important to synthesize high-quality binaural audio signals for this aim. Because the ambient noise is often inevitable in practical application scenarios, it is highly desired that binaural audio signals without noise can be obtained from microphone-array signals directly. For this purpose, this paper proposes a new end-to-end noise-immune binaural audio synthesis framework from microphone-array signals, abbreviated as Array2BR, and experimental results show that binaural cues can be correctly mapped and noise can be well suppressed simultaneously using the proposed framework. Compared with existing methods, the proposed method achieved better performance in terms of both objective and subjective metric scores.

array2br, international conference, module, (14 more...)

arXiv.org Artificial Intelligence

2410.05739

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Human Computer Interaction (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Binaural Rendering of Ambisonic Signals by Neural Networks

Zhu, Yin, Kong, Qiuqiang, Shi, Junjie, Liu, Shilei, Ye, Xuzhou, Wang, Ju-chiang, Zhang, Junping

arXiv.org Artificial IntelligenceNov-4-2022

Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media. Conventional methods often require manually measured Head-Related Transfer Functions (HRTFs). To address this issue, we collect a paired ambisonic-binaural dataset and propose a deep learning framework in an end-to-end manner. Experimental results show that neural networks outperform the conventional method in objective metrics and achieve comparable subjective metrics. To validate the proposed framework, we experimentally explore different settings of the input features, model structures, output features, and loss functions. Our proposed system achieves an SDR of 7.32 and MOSs of 3.83, 3.58, 3.87, 3.58 in quality, timbre, localization, and immersion dimensions.

artificial intelligence, machine learning, rendering, (18 more...)

arXiv.org Artificial Intelligence

2211.02301

Country:

Asia > Middle East > Israel (0.04)
Asia > China > Jiangsu Province > Xuzhou (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

End-to-End Binaural Speech Synthesis

Huang, Wen Chin, Markovic, Dejan, Richard, Alexander, Gebru, Israel Dejene, Menon, Anjali

arXiv.org Artificial IntelligenceJul-8-2022

In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb. The network is a modified vector-quantized variational autoencoder, trained with several carefully designed objectives, including an adversarial loss. We evaluate the proposed system on an internal binaural dataset with objective metrics and a perceptual study. Results show that the proposed approach matches the ground truth data more closely than previous methods. In particular, we demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.

decoder, discriminator, proc, (14 more...)

arXiv.org Artificial Intelligence

2207.03697

Country:

North America > United States (0.04)
Asia > Middle East > Israel (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.62)

Add feedback